We are using an example data set used in the Appendix by Shumway and Stoffer (2017). Additional examples for code using R to explore time series data is also found in Hyndman and Athanasopoulos (n.d.). For this analysis we use tidyverse Wickham et al. (2019), lubridate Grolemund and Wickham (2011), PerformanceAnalytics Peterson and Carl (2020), Hmisc Harrell Jr, Charles Dupont, and others. (2020), astsa Stoffer (2020), ggfortify Horikoshi and Tang (2018), tseries Trapletti and Hornik (2020), forecast Hyndman and Khandakar (2008), and gridExtra Auguie (2017).
Time series data can be thought of as data consistently measured or observed at different points in time. In this way, the data have an implicit order or flow, and questions about time series data often involve questions about change over time.
#Libraries
library(tidyverse)
library(lubridate)
library(PerformanceAnalytics)
library(Hmisc)
library(astsa)
library(ggfortify)
library(tseries)
library(forecast)
library(gridExtra)
The AirPassenger dataset in the astsa package in R provides monthly totals of a US airline passengers, from 1949 to 1960. This data is already of a time series class therefore no further class or date manipulation is required.
data(AirPassengers)
AP <- AirPassengers
# Take a look at the class of the dataset AirPassengers
class(AP)
## [1] "ts"
The class object is ts which means it has inherent time structures. (Note that these time structures are not apparent with the head() command.)
# Take a look at the entries
AP
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
head(AP)
## Jan Feb Mar Apr May Jun
## 1949 112 118 132 129 121 135
Initially, we want to check for a few key assumptions in the dataclass ts:
NA values# Check for missing values
sum(is.na(AP))
## [1] 0
# Check the frequency of the time series
frequency(AP)
## [1] 12
# Check the cycle of the time series
cycle(AP)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 1 2 3 4 5 6 7 8 9 10 11 12
## 1950 1 2 3 4 5 6 7 8 9 10 11 12
## 1951 1 2 3 4 5 6 7 8 9 10 11 12
## 1952 1 2 3 4 5 6 7 8 9 10 11 12
## 1953 1 2 3 4 5 6 7 8 9 10 11 12
## 1954 1 2 3 4 5 6 7 8 9 10 11 12
## 1955 1 2 3 4 5 6 7 8 9 10 11 12
## 1956 1 2 3 4 5 6 7 8 9 10 11 12
## 1957 1 2 3 4 5 6 7 8 9 10 11 12
## 1958 1 2 3 4 5 6 7 8 9 10 11 12
## 1959 1 2 3 4 5 6 7 8 9 10 11 12
## 1960 1 2 3 4 5 6 7 8 9 10 11 12
Next we can visualize the data by simply plotting the ts object.
# Plot the raw data using the base plot function
plot(AP,xlab="Date", ylab = "Passenger numbers (Thousands)",main="Air Passenger numbers from 1949 to 1961")
As an alternative to the base plot function, so we can also use the extension ggfortify in R, which works with the ggplot2 package. This way, we avoid having to convert to a dataframe as required with ggplot2, but still having access to the layering grammar of graphics.
autoplot(AP) + labs(x ="Date", y = "Passenger numbers (Thousands)", title="Air Passengers from 1949 to 1961") + theme_bw()
These basic plots provide a lot of information about the nature of the time series, in particularly with respect to periodicity and trend.
First, we can see from the plot a repeating peak-trough shape, that seems to repeat every year or so. This is evidence of periodic change and perhaps seasonality.
Second, we can see that the amplitude (distance from trough to peak) increases as time goes on. This may be evidence for a nonstationary time series.
Third, we can see evidence of a general trend. That is, overall, the number of passengers has been increasing from 1950 to 1960, even over the periodic fluctations.
We can further investigate seasonality with a boxplot:
#Let’s use the boxplot function to see any seasonal effects.
boxplot(AP~cycle(AP),xlab="Date", ylab = "Passenger Numbers (1000's)" ,main ="Monthly Air Passengers Boxplot from 1949 to 1961")
From these exploratory plots, we can make some initial conclusions:
The passenger numbers increase over time with each year which may be indicative of an increasing linear trend, perhaps due to increasing demand for flight travel and commercialization of airlines in that time period.
In the boxplot there are more passengers travelling in months 6 to 9 with higher means and higher variances than the other months, indicating seasonality with a apparent cycle of 12 months. The rationale for this could be more people taking holidays and fly over the summer months in the US.
The air passengers data appears to be multiplicative time series as the passenger numbers increase, it appears so does the pattern of seasonality.
There do not appear to be any outliers and there are no missing values. Therefore no data cleaning is required.
Continuing to use ggfortify, we can further explore the multiplicative nature of the time series uisng the decompose(data,type) command.
#With this model, we will use the decompose function in R. Continuing to use ggfortify for plots, in one line, autoplot these decomposed #components to further analyze the data.
decomposeAP <- decompose(AP,"multiplicative")
autoplot(decomposeAP)
In these decomposed plots we can again see the trend and seasonality as inferred previously, but we can also observe the estimation of the random component depicted under the “remainder”.
A stationary time series has three conditions: the mean, the variance and the covariance of a time series must not be functions of time. Strict stationarity…
In order to fit ARIMA models, the time series is required to be stationary. We will use two methods to test the stationarity.
In order to test the stationarity of the time series, let’s run the Augmented Dickey-Fuller Test using the adf.test function from the tseries R package.
For this test, the null hypothesis, \(H_0\) is that the time series is nonstationary, and the alternative hypothesis, \(H_A\) is that the time series is stationary (mean, variance, covariance are time independent).
adf.test(AP)
##
## Augmented Dickey-Fuller Test
##
## data: AP
## Dickey-Fuller = -7.3186, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary
If we set \(\alpha\) to 0.05, the results of the test suggest that time series is nonstationary. This assumption is supported by our initial visualizations.
We can also use autocorrelation to help assess if our time series is stationary.
The general thrust behind autocorrelation is the concept of a lag, or a delay between one point \(x_s\) and another point \(x_t\) in the same series. A lag of 1 means comparing a the original time series (\(x_1,x_2,...x_s\)) with its counter part (\(x_{1+1},x_{2+1},...,x_{s}\)). Note that the length of the string does change a forward or backward lag of any length (except 0). The correlation between the orginal time series and lagged counterparts form the basis for autocorrelation.
We will use autocorrelation function acf in from the base stats Rpackage. This function plots the correlation between a series and its lags with a 95% confidence interval in blue. If the autocorrelation value crosses the dashed blue line, it means that specific lag is significantly correlated with current series.
autoplot(acf(AP,plot=FALSE))+ labs(title="Correlogram of Air Passengers from 1949 to 1961")+
theme_bw()
The maximum at lag 1 or 12 months, indicates a positive relationship with the 12 month cycle.
Since we have already created a decomposeAP list object with a random component, we can plot the acf of the decomposeAP$random which is equivalent to plotting the remainder plot in our initial decomposition.
# Review random time series for any missing values
decomposeAP$random
## Jan Feb Mar Apr May Jun Jul
## 1949 NA NA NA NA NA NA 0.9516643
## 1950 0.9626030 1.0714668 1.0374474 1.0140476 0.9269030 0.9650406 0.9835566
## 1951 1.0138446 1.0640180 1.0918541 1.0176651 1.0515825 0.9460444 0.9474041
## 1952 1.0258814 1.0939696 1.0134734 0.9695596 0.9632673 1.0003735 0.9468562
## 1953 0.9976684 1.0151646 1.0604644 1.0802327 1.0413329 0.9718056 0.9551933
## 1954 0.9829785 0.9232032 1.0044417 0.9943899 1.0119479 0.9978740 1.0237753
## 1955 1.0154046 0.9888241 0.9775844 1.0015732 0.9878755 1.0039635 1.0385512
## 1956 1.0066157 0.9970250 0.9876248 0.9968224 0.9985644 1.0275560 1.0217685
## 1957 0.9937293 0.9649918 0.9881769 0.9867637 0.9924177 1.0328601 1.0261250
## 1958 0.9954212 0.9522762 0.9469115 0.9383993 0.9715785 1.0261340 1.0483841
## 1959 0.9825176 0.9505736 0.9785278 0.9746440 1.0177637 0.9968613 1.0373136
## 1960 1.0039279 0.9590794 0.8940857 1.0064948 1.0173588 1.0120790 NA
## Aug Sep Oct Nov Dec
## 1949 0.9534014 1.0022198 1.0040278 1.0062701 1.0118119
## 1950 0.9733720 1.0225047 0.9721928 0.9389527 1.0067914
## 1951 0.9397599 0.9888637 0.9938809 1.0235337 1.0250824
## 1952 0.9931171 0.9746302 1.0046687 1.0202797 1.0115407
## 1953 0.9894989 0.9934337 1.0192680 1.0009392 0.9915039
## 1954 0.9845184 0.9881036 0.9927613 0.9995143 0.9908692
## 1955 0.9831117 1.0032501 1.0003084 0.9827720 1.0125535
## 1956 1.0004765 1.0008730 0.9835071 0.9932761 0.9894251
## 1957 1.0312668 1.0236147 1.0108432 1.0212995 1.0005263
## 1958 1.0789695 0.9856540 0.9977971 0.9802940 0.9405687
## 1959 1.0531001 0.9974447 1.0013371 1.0134608 0.9999192
## 1960 NA NA NA NA NA
# Autoplot the random time series from 7:138 which exclude the NA values
autoplot(acf(decomposeAP$random[7:138],plot=FALSE))+ labs(title="Correlogram of Air Passengers Random Component from 1949 to 1961") +theme_bw()
We can use the stationarity exploration to then fit models to the data.
Since there is an upwards trend we will look at a linear model first for comparison. We plot AirPassengers raw dataset with a blue linear model.
autoplot(AP) + geom_smooth(method="lm")+ labs(x ="Date", y = "Passenger numbers (1000's)", title="Air Passengers from 1949 to 1961")
This may not be the best model to fit as it doesn’t capture the seasonality and multiplicative effects over time.
Instead, we might try an Auto Regressive Integrated Moving Average. This is a form of regression that uses lags to explain variation.
As a simple introduction we can use the auto.arima function from the forecast R package to fit the best model and coefficients, given the default parameters including seasonality as TRUE. Note we have used the ARIMA modeling procedure as referenced..
arimaAP <- auto.arima(AP)
arimaAP
## Series: AP
## ARIMA(2,1,1)(0,1,0)[12]
##
## Coefficients:
## ar1 ar2 ma1
## 0.5960 0.2143 -0.9819
## s.e. 0.0888 0.0880 0.0292
##
## sigma^2 estimated as 132.3: log likelihood=-504.92
## AIC=1017.85 AICc=1018.17 BIC=1029.35
Finally we can plot a forecast of the time series using the forecast function, again from the forecast R package, with a 95% confidence interval where \(h\) is the forecast horizon periods in months.
forecastAP <- forecast(arimaAP, level = c(95), h = 36)
autoplot(forecastAP)
Sometimes we might have a collection of time series that we might want to interpret together. For example, we can work with three data sets from Shumway and Stoffer’s 2017 book Shumway and Stoffer (2017).
The first data set is called cmort and captures the average weekly cardiovascular mortality in Los Angeles County from 1970 to 1980.
The second data set is called part and captures the average weekly particulate matter concentration in Los Angeles County from 1970 to 1980.
The third data set is called tempr and captures the average weekly temperature in Los Angeles County from 1970 to 1980.
We can instruct the R to read and format each data set using ts(data, start=(),frequency) arguments, where in the start we dictate the year and week of the data start, and the frequency specifies how many weeks (52) to consider. Using ts objects is very handy because we don’t have to store and manipulate dates (if they are regular interval), and visualization becomes easier.
## Load in and save data from astsa package
data(cmort)
cmort <- cmort
data(tempr)
tempr <- tempr
data(part)
part <- part
## Save as ts objects
cmort.ts <- ts(cmort, start = c(1970,1),frequency = 52)
part.ts <- ts(part, start = c(1970,1), frequency = 52)
tempr.ts <- ts(tempr, start=c(1970,1), frequency = 52)
The first thing we might want to do is simply visualize the data. Because the cmort,tempr and part time series are all on the same time scale (weekly, 1970-1980), we can plot them on the same plot.
#Set figure parameters
layout(matrix(c(1:3, 1:3), ncol=2), height=c(.5,.5, .5))
par(mar=c(.2,2,0,.5), oma=c(3.2,3,2.5,0.5), las=1)
#Particulate matter
plot(part.ts, type='n', xaxt='no', ylab='PM Conc.')
grid(lty=1, col=gray(.9))
lines(part.ts, col = "cadetblue3")
#Cardiovascular mortality
plot(cmort.ts, type='n', xaxt='no', ylab='Cardiac Mortality')
grid(lty=1, col=gray(.9))
lines(cmort.ts, col="orange3")
#Temperature
plot(tempr.ts, type='n', ylab="Temperature")
grid(lty=1, col=gray(.9))
lines(tempr.ts, col="forestgreen")
#Labels
mtext(side=1, "Year", line=2)
title(main = "Particulate Concentration, Cardiovascular Mortality, and Temperature", outer = T)
The first subplot (blue line) shows the weekly average particulate matter concentration. The next subplot (orange line) shows the weekly cardiovascular death count. The final line in green shows the average weekly temperature. From a descriptive stand point, a few things stand out about this plot:
All of these lines exhibit a periodicity or repetivite cycle of peaks and troughs.
The peaks in the particulate matter line (blue) and peaks in the cardiovascular mortality line (orange) are relatively similar – they seem to occur within close proximity of each other.
The green line (temperature) appears to be somewhat offset from the cardiovascular and particulate matter lines.
The amplitude appears to be decreasing for the cardiovascular mortality, and relatively constant for the other lines.
*Note the strong seasonal components in all of the series, corresponding to winter-summer variations and the downward trend in the cardiovascular mortality over the 10-year period (Shumway and Stoffer 2006, 2017).
We might want to start by simply visualizing the different time series together using a simple correlation measure for the different time series. The diagonal displays the histogram and kernel density estimates of the variables. The lower off-diagonal shows the bivariate scatterplots with a lowess fitted line, and the upper off-diagonal shows the Pearson’s \(R\) coefficient with significance levels (\(\cdot\) p < 0.1, ** p < 0.01, *** p < 0.001).
#Correlation, scatter plots, and bivariate regression
df <- cbind(Particulates=part,
Mortality=cmort,
Temperature=tempr)
chart.Correlation(df)
We might also consider looking for seasonality by either plotting weekly totals, or aggregating up to month.
## Weekly Aggregates
boxplot(cmort.ts~cycle(cmort.ts),xlab="Date", ylab = "Cardiovascular Mortality" ,main ="Weekly Cardiovascular Mortality Boxplot from 1949 to 1961")
boxplot(tempr.ts~cycle(tempr.ts),xlab="Date", ylab = "Temperature" ,main ="Weekly Temperature Boxplot from 1949 to 1961")
boxplot(part.ts~cycle(part.ts),xlab="Date", ylab = "Particulate Matter" ,main ="Weekly Particulate Matter Conc. Boxplot from 1949 to 1961")
We also might want to visualize the decomposition of the cardiovascular mortality time series– which, looks additve from our data.
cmort.decomp <- decompose(cmort.ts, "additive")
tempr.decomp <- decompose(tempr.ts,"additive")
autoplot(tempr.decomp)
autoplot(cmort.decomp)
We can also consider each time series separately by plotting the lag correlations (as below with the cardiovascular mortality data).
## Plotting correlation:
layout(matrix(c(1:3, 1:3), ncol=2), height=c(.5,.5, .5))
par(mar=c(.2,.5,1.5,.5), oma=c(3.2,3,1.5,0.5), las=1)
par(mfrow=c(3,3))
n <- seq(1,18,2)
for (i in n){
plot(as.vector(cmort),lag(as.vector(cmort),n=i),
xlab= "C.Mort.", ylab=paste("Lag",i),
main = paste("Lag",i),xaxt='n',yaxt='n')
}
title(main = "Cardiovascular Mortality with Lags",outer = T)
We can also see this pattern confirmed in the acf plot (also called a correlogram). To measure dependence, we can use an autocorrelation function (ACF) in a time series model. ACF measures the linear predictability of the time series (\(x_1,x_2,...x_n\)) at time \(s\) using only its adjacent value at time \(t\) (Shumway and Stoffer 2006). This can be done by using a linear combination of inputs. It also ACF provides the ability to forecast the series at time \(s+1\) from the value of \(x\) at time \(t\) (\(x_s\)).
## Plotting ACF Cardiovascular Mortality
autoplot(acf(cmort.ts,plot=F, type = "correlation"))+labs(title="Correlogram of Cardiovascular Mortality, 1970-1980")+geom_vline(xintercept = .308,col="pink",lty=2) +theme_bw()
Recall from the univariate case that ACF can also help us assess stationarity. Prolonged negative or positive values indicate nonstationarity. These plots have a significant correlations that persist too long to be stationary.
## Plot temperature
g1 <- autoplot(acf(part.ts,plot=F, type = "correlation"))+labs(title="Correlogram of Particular Matter, 1970-1980")+theme_bw()
## Plot particulate matter
g2 <- autoplot(acf(tempr.ts,plot=F, type = "correlation"))+labs(title="Correlogram of Temperature, 1970-1980") +theme_bw()
grid.arrange(ncol=1,g1,g2)
We can also assess the relationship between two time series explicitly using the sample cross correlation. For some background, the sample CCF is the set of sample correlations between \(part_{t+h}\) and \(cmort_t\) for example, where h is the lag of (\(0, +/- 1, +/-2,...\)).
We can investigate this using ccf command in R.
pm_cmort.ccf <- ccf(part,cmort,type="correlation", plot = F)
autoplot(pm_cmort.ccf)+ggtitle("Cardiovascular Mortality and Particulate Matter CCF")
temp_cmort.ccf <- ccf(tempr,cmort,type="correlation", plot = F)
autoplot(temp_cmort.ccf)+ggtitle("Cardiovascular Mortality and Temperature CCF")
From these plots, it seems that peak particulate matter lead is around -0.1356. The peak temperatures (mean centered) is -.4433 which again suggests that temperature leads cardiovascular mortality.
Finally, we can learn a lot from simple scatterplots about relationship between cardiovascular mortality and temperature and/or particulate matter. The scatterplots indicate a possible nonlinear relation between mortality and the temperature, and a more linear pattern between mortality and particulate matter.
par(mfrow=c(1,2))
plot(tempr, cmort, xlab="Temperature", ylab="Mortality", main = "Mortality and \nTemperature")
lines(lowess(tempr, cmort),col="red")
plot(part,cmort,xlab="Particulate Matter Conc.",ylab = "Mortality", main = "Mortality and \n Particulate Matter")
lines(lowess(part,cmort),col="red")
We can further check the nonstationariy with the ADF test.
df <- cbind(Mortality = cmort.ts,PM = part.ts,Temp = tempr.ts)
#autoplot(df)+ggtitle("Time Series Cardiovascular Mortality, Temperature, PM Conc.")+theme(plot.title=element_text(hjust=0.5))
apply(df,2,adf.test)
## $Mortality
##
## Augmented Dickey-Fuller Test
##
## data: newX[, i]
## Dickey-Fuller = -5.4125, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
##
##
## $PM
##
## Augmented Dickey-Fuller Test
##
## data: newX[, i]
## Dickey-Fuller = -4.493, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
##
##
## $Temp
##
## Augmented Dickey-Fuller Test
##
## data: newX[, i]
## Dickey-Fuller = -4.4572, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
Based on our exploratory data analysis, we have a good sense of what to expect in a model:
So we can model this:
#Center temperature to account for scaling
temp <- tempr-mean(tempr)
temp2 <- temp^2
trend <- time(cmort) # time index
#Linear regression fit
summary(fit <- lm(cmort~ trend + temp + temp2 + part, na.action=NULL))
##
## Call:
## lm(formula = cmort ~ trend + temp + temp2 + part, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.0760 -4.2153 -0.4878 3.7435 29.2448
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.831e+03 1.996e+02 14.19 < 2e-16 ***
## trend -1.396e+00 1.010e-01 -13.82 < 2e-16 ***
## temp -4.725e-01 3.162e-02 -14.94 < 2e-16 ***
## temp2 2.259e-02 2.827e-03 7.99 9.26e-15 ***
## part 2.554e-01 1.886e-02 13.54 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.385 on 503 degrees of freedom
## Multiple R-squared: 0.5954, Adjusted R-squared: 0.5922
## F-statistic: 185 on 4 and 503 DF, p-value: < 2.2e-16
#ANOVA table (compare to next line)
summary(aov(fit))
## Df Sum Sq Mean Sq F value Pr(>F)
## trend 1 10667 10667 261.62 <2e-16 ***
## temp 1 8607 8607 211.09 <2e-16 ***
## temp2 1 3429 3429 84.09 <2e-16 ***
## part 1 7476 7476 183.36 <2e-16 ***
## Residuals 503 20508 41
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(aov(lm(cmort~cbind(trend, temp, temp2, part))))
## Df Sum Sq Mean Sq F value Pr(>F)
## cbind(trend, temp, temp2, part) 4 30178 7545 185 <2e-16 ***
## Residuals 503 20508 41
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#AIC, BIC, AICc
num <- length(cmort) # sample size
AIC(fit)/num - log(2*pi) # AIC
## [1] 4.721732
BIC(fit)/num - log(2*pi) # BIC
## [1] 4.771699
(AICc <- log(sum(resid(fit)^2)/num) + (num+5)/(num-5-2)) #AiCc
## [1] 4.722062
We can further relate mean adjusted temperature and particulate levels to cardiovascular mortality.
#ACF
acf2(resid(fit), 52) # implies AR2
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.34 0.44 0.28 0.28 0.16 0.12 0.07 0.01 0.03 -0.05 -0.02 0.00 -0.04
## PACF 0.34 0.36 0.08 0.06 -0.05 -0.05 -0.02 -0.05 0.02 -0.06 0.00 0.06 -0.02
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF -0.02 0.01 -0.05 -0.01 -0.03 -0.06 -0.03 -0.03 -0.05 0.00 -0.02 -0.01
## PACF 0.00 0.04 -0.08 0.00 -0.01 -0.06 0.03 0.02 -0.03 0.05 -0.01 0.00
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF -0.05 0.03 -0.11 -0.02 -0.10 -0.02 -0.09 -0.03 -0.10 -0.08 -0.10 -0.07
## PACF -0.06 0.06 -0.11 0.00 -0.05 0.05 -0.04 0.04 -0.06 -0.06 -0.05 0.03
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48] [,49]
## ACF -0.07 -0.05 -0.05 -0.04 -0.03 -0.04 0.05 0.00 0.04 0.08 0.07 0.06
## PACF -0.02 0.02 -0.01 0.00 0.00 -0.01 0.08 -0.01 -0.01 0.08 -0.01 -0.01
## [,50] [,51] [,52]
## ACF 0.05 0.07 0.02
## PACF -0.03 0.03 -0.04
sarima(cmort, 2,0,0, xreg=cbind(trend,temp,temp2,part))
## initial value 1.849900
## iter 2 value 1.733730
## iter 3 value 1.701063
## iter 4 value 1.682463
## iter 5 value 1.657377
## iter 6 value 1.652444
## iter 7 value 1.641726
## iter 8 value 1.635302
## iter 9 value 1.630848
## iter 10 value 1.629286
## iter 11 value 1.628731
## iter 12 value 1.628646
## iter 13 value 1.628634
## iter 14 value 1.628633
## iter 15 value 1.628632
## iter 16 value 1.628628
## iter 17 value 1.628627
## iter 18 value 1.628627
## iter 19 value 1.628626
## iter 20 value 1.628625
## iter 21 value 1.628622
## iter 22 value 1.628618
## iter 23 value 1.628614
## iter 24 value 1.628612
## iter 25 value 1.628611
## iter 26 value 1.628610
## iter 27 value 1.628610
## iter 28 value 1.628609
## iter 29 value 1.628609
## iter 30 value 1.628608
## iter 31 value 1.628608
## iter 32 value 1.628608
## iter 32 value 1.628608
## iter 32 value 1.628608
## final value 1.628608
## converged
## initial value 1.630401
## iter 2 value 1.630393
## iter 3 value 1.630382
## iter 4 value 1.630381
## iter 5 value 1.630375
## iter 6 value 1.630370
## iter 7 value 1.630362
## iter 8 value 1.630354
## iter 9 value 1.630349
## iter 10 value 1.630347
## iter 11 value 1.630347
## iter 12 value 1.630347
## iter 13 value 1.630347
## iter 14 value 1.630346
## iter 15 value 1.630346
## iter 16 value 1.630346
## iter 17 value 1.630346
## iter 17 value 1.630346
## iter 17 value 1.630346
## final value 1.630346
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = xreg, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 intercept trend temp temp2 part
## 0.3848 0.4326 3075.1482 -1.5165 -0.0190 0.0154 0.1545
## s.e. 0.0436 0.0400 834.7286 0.4226 0.0495 0.0020 0.0272
##
## sigma^2 estimated as 26.01: log likelihood = -1549.04, aic = 3114.07
##
## $degrees_of_freedom
## [1] 501
##
## $ttable
## Estimate SE t.value p.value
## ar1 0.3848 0.0436 8.8329 0.0000
## ar2 0.4326 0.0400 10.8062 0.0000
## intercept 3075.1482 834.7286 3.6840 0.0003
## trend -1.5165 0.4226 -3.5881 0.0004
## temp -0.0190 0.0495 -0.3837 0.7014
## temp2 0.0154 0.0020 7.6117 0.0000
## part 0.1545 0.0272 5.6803 0.0000
##
## $AIC
## [1] 6.130066
##
## $AICc
## [1] 6.130507
##
## $BIC
## [1] 6.196687
We assume that the current value at time t is white noise temporarily. The sample ACF and partial ACF (PACF) of the residuals from the ordinary least squares fit of the model. The residual analysis output from sarima shows no obvious departure of the residuals from white noise.
Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. https://www.jstatsoft.org/v40/i03/.
Harrell Jr, Frank E, with contributions from Charles Dupont, and many others. 2020. Hmisc: Harrell Miscellaneous. https://CRAN.R-project.org/package=Hmisc.
Horikoshi, Masaaki, and Yuan Tang. 2018. Ggfortify: Data Visualization Tools for Statistical Analysis Results. https://CRAN.R-project.org/package=ggfortify.
Hyndman, Rob, and George Athanasopoulos. n.d. Forecasting: Principles and Practice (3rd Ed). 3rd ed. Accessed May 2, 2021. https://Otexts.com/fpp3/.
Hyndman, Rob J, and Yeasmin Khandakar. 2008. “Automatic Time Series Forecasting: The Forecast Package for R.” Journal of Statistical Software 26 (3): 1–22. https://www.jstatsoft.org/article/view/v027i03.
Peterson, Brian G., and Peter Carl. 2020. PerformanceAnalytics: Econometric Tools for Performance and Risk Analysis. https://CRAN.R-project.org/package=PerformanceAnalytics.
Shumway, Robert H., and David S. Stoffer. 2006. Time Series Analysis and Its Applications: With R Examples. 2nd [updated] ed. Springer Texts in Statistics. New York: Springer.
———. 2017. Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-52452-8.
Stoffer, David. 2020. Astsa: Applied Statistical Time Series Analysis. https://CRAN.R-project.org/package=astsa.
Trapletti, Adrian, and Kurt Hornik. 2020. Tseries: Time Series Analysis and Computational Finance. https://CRAN.R-project.org/package=tseries.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.